A Dataset Generator for Whole Genome Shotgun Sequencing

نویسنده

  • Eugene W. Myers
چکیده

Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer

BACKGROUND The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequen...

متن کامل

Estimation of the Redundancy in Human Genome Shotgun Sequencing by a Monte-Carlo Simulation

In order to quantitatively comprehend the essence of whole genome shotgun sequencing, a Monte-Carlo simulation was carried out. It was estimated that even a vast genome such as human genome can be sequenced at a moderate redundancy ( 7) with a satisfactory accuracy (10 error rate), resulting in a high sequencing speed and much lower cost. Switching from a random process (i.e., shotgun) to a dir...

متن کامل

Whole-genome shotgun sequencing of a colonizing multilocus sequence type 17 Streptococcus agalactiae strain.

This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.

متن کامل

Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project.

A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a s...

متن کامل

Analyzing WGBS with the bsseq package

This document discusses the ins and outs of an analysis of a whole-genome shotgun bisulfite sequencing (WGBS) dataset, using the BSmooth algorithm, which was first used in [1] and more formally presented and evaluated in [2]. The intention with the document is to focus on analysis-related tasks and questions. Basic usage of the bsseq package is covered in “The bsseq user’s guide”. It may be use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره   شماره 

صفحات  -

تاریخ انتشار 1999